Accrual Failure Detectors

نویسنده

  • Naohiro Hayashibara
چکیده

Failure detection is a fundamental building block for ensuring fault tolerance in distributed systems. For this reason, many people have been advocating that failure detection should be provided as some form of service [1, 2, 3, 4, 5], similar to IP address lookup (DNS) or time synchronization (e.g., NTP). Unfortunately, in spite of important technical breakthroughs, this view has met little success so far. We believe that one of the main reasons is the conventional binary interaction (i.e., trust vs. suspect) that makes it difficult to meet the requirements of several distributed applications running simultaneously. For this reason, we advocate a different abstraction that helps decoupling application requirements from issues related to the underlying system. It is well-known that there exists an inherent tradeoff between (1) conservative failure detection (i.e., reducing the risk of wrongly suspecting a running process), and (2) aggressive failure detection (i.e., quickly detecting the occurrence of a real crash). Thre exists a continuum of valid choices between these two extremes, and what defines an appropriate choice is strongly related to application requirements.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Accrual Failure Detectors

Traditionally, failure detectors have considered a binary model whereby a given process can be either trusted or suspected. This paper defines a family of failure detectors, called accrual failure detectors, that revisits this interaction model. Accrual failure detectors associate to each process a real value representing a suspicion level. An important advantage of accrual failure detectors ov...

متن کامل

Definition and properties of accrual failure detectors : an overview

Ensuring fast and accurate failure detection is a fundamental issue for building efficient fault-tolerant distributed systems. In an effort to make fault-tolerant applications easier to implement, we are trying to provide failure detection as a generic Internet service, similar to what was done very successfully with NTP (network time protocol) for clock synchronization. To do so, we must revis...

متن کامل

A Weibull distribution accrual failure detector for cloud computing

Failure detectors are used to build high availability distributed systems as the fundamental component. To meet the requirement of a complicated large-scale distributed system, accrual failure detectors that can adapt to multiple applications have been studied extensively. However, several implementations of accrual failure detectors do not adapt well to the cloud service environment. To solve ...

متن کامل

The Φ Accrual Failure Detector

Detecting failures is a fundamental issue for fault-tolerance in distributed systems. Recently, many people have come to realize that failure detection ought to be provided as some form of generic service, similar to IP address lookup or time synchronization. However, this has not been successful so far. One of the reasons is the difficulty to satisfy several application requirements simultaneo...

متن کامل

Low-Overhead Accrual Failure Detector

Failure detectors are one of the fundamental components for building a distributed system with high availability. In order to maintain the efficiency and scalability of failure detection in a complicated large-scale distributed system, accrual failure detectors that can adapt to multiple applications have been studied extensively. In this paper, an new accrual failure detector--LA-FD with low s...

متن کامل

LA - FD : a Low - overhead Accrual Failure Detector ?

Failure detector is one of the fundamental components for building a distributed system with high availability. In order to maintain the efficiency and scalability of failure detection in a complicate largescale distributed system, accrual failure detectors that can adapt to multiple applications have been studied extensively. In this paper, an accrual failure detector — LA-FD with low system o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004